Getting a subset of a data structure

Problem

You want to do get a subset of the elements of a vector, matrix, or data frame.

Solution

To get a subset based on some conditional criterion, the subset() function or indexing using square brackets can be used. In the examples here, both ways are shown.

# A sample vector
v <- c(1,4,4,3,2,2,3)

subset(v, v<3)
v[v<3]
# 1 2 2

# Another vector
t <- c("small", "small", "large", "medium")

# Remove "small" entries
subset(t, t!="small")
t[t!="small"]
# "large"  "medium"

One important difference between the two methods is that you can assign values to elements with square bracket indexing, but you cannot with subset().

v[v<3] <- 9
# 9 4 4 3 9 9 3

subset(v, v<3) <- 9
# Error in subset(v, v < 3) <- 9 : could not find function "subset<-"

With data frames:

# A sample data frame
data <- read.table(header=T, con <- textConnection('
 subject sex size
       1   M    7
       2   F    6
       3   F    9
       4   M   11
 '))
close(con)

subset(data, subject < 3)
data[data$subject < 3, ]
# subject sex size
#       1   M    7
#       2   F    6

# Subset of particular rows and columns
subset(data, subject < 3, select = -subject)
subset(data, subject < 3, select = c(sex,size))
subset(data, subject < 3, select = sex:size)
data[data$subject < 3, c("sex","size")]
# sex size
#   M    7
#   F    6

# Logical AND of two conditions
subset(data, subject < 3  &  sex=="M")
data[data$subject < 3  &  data$sex=="M", ]
# subject sex size
#       1   M    7

# Logical OR of two conditions
subset(data, subject < 3  |  sex=="M")
data[data$subject < 3  |  data$sex=="M", ]
# subject sex size
#       1   M    7
#       2   F    6
#       4   M   11

# Condition based on transformed data
subset(data, log2(size)>3 )
data[log2(data$size) > 50, ]
# subject sex size
#       3   F    9
#       4   M   11

# Subset if elements are in another vector
subset(data, subject %in% c(1,3))
data[data$subject %in% c(1,3), ]
# subject sex size
#       1   M    7
#       3   F    9

Notes

Also see ../Indexing into a data structure.